-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tage #443
base: dev
Are you sure you want to change the base?
Conversation
Rebasing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally good, a few minor points
@@ -149,13 +149,13 @@ The Branch-Prediction section contains those options to parameterise the branch | |||
The current options include: | |||
|
|||
Type | |||
The type of branch predictor that is used, the options are ``Generic``, and ``Perceptron``. Both types of predictor use a branch target buffer with each entry containing a direction prediction mechanism and a target address. The direction predictor used in ``Generic`` is a saturating counter, and in ``Perceptron`` it is a perceptron. | |||
The type of branch predictor that is used, the options are ``Generic``, ``Perceptron``, and ``Tage``. Each of these types of predictor use prediction tables with each entry containing a direction prediction mechanism and a target address. The direction predictor used in ``Generic`` and ``TAGE`` is a saturating counter, and in ``Perceptron`` it is a perceptron. ``TAGE`` also uses a series of further, tagged prediction tables to provide predictions informed by greater branch histories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a good reason behind using Tage
and TAGE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not. I've udpated to Tage
throughout, as this is the capitalisation used in the config yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the creator uses all forms of capitalisation
@@ -29,10 +29,15 @@ Queue-Sizes: | |||
Load: 40 | |||
Store: 24 | |||
Branch-Predictor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some TX2 diagrams note it's use of a multi-history branch predictor. I assume this is TAGE-like so maybe apply this config update to the TX2 YAML as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds like it would be. I've updated the TX2 config as well.
for (uint32_t i = 0; i < numTageTables_; i++) { | ||
std::vector<TageEntry> newTable; | ||
for (uint32_t j = 0; j < (1ul << tageTableBits_); j++) { | ||
TageEntry newEntry = {2, 0, 1, 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we not want to initialise the TageEntry
with a SatCnt equal to the once used in the btb_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch
// global history (folded onto itself to make it of the correct size). | ||
uint64_t h1 = (address >> 2); | ||
uint64_t h2 = globalHistory_.getFolded(1ull << (table + 1), tageTableBits_); | ||
// Then truncat the XOR to make it fit thed esired size of an index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*the desired
// global history (folded onto itself to make it of the correct size). | ||
uint64_t h1 = (address >> 2); | ||
uint64_t h2 = globalHistory_.getFolded(1ull << (table + 1), tageTableBits_); | ||
// Then truncat the XOR to make it fit thed esired size of an index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*truncate
Could you also add tage to the a64fx_SME.yaml config |
… fixed but dynamically chosen size
….e., is now different from the first tagged table)
Merging with changes to dev
Merging with dev
Merging with dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean PR. Nicely precisely commented and everything is very easily readable. I like the branch history class which is also well explained
if (i == 0) { | ||
history_[i] |= ((isTaken) ? 1 : 0); | ||
} else { | ||
history_[i] |= (((history_[i - 1] & (1ull << 63)) > 0) ? 1 : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need the conditional statement? After doing the AND you could shift right by 63 to get your 0 or 1. Would be slightly fewer cycles and more understandable/readable in my eyes (you may disagree)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the conditional is needed here. Whats being loaded into the uint64 depends on where it is in the vector. All but the least-significant uint64s get the MSB of the next uint64 added as the LSB. But the least-significant uint64 gets isTaken added as the LSB. However, if I'm misunderstanding your Q LMK.
* outcome, 'position' would be 0. | ||
* */ | ||
void updateHistory(bool isTaken, uint64_t position) { | ||
if (position < size_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we assert position being < size_ as above, or are there cases where this could "validly" be greater? For instance, if you are trying to update an entry that has been lost from the history because there have been too many branches in the meantime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly as you say, I don't think that this should be an assert as the core may validly try to update a history that is no longer being tracked. The reason that we should allow this is to allow the pipeline not to need to know the size of the branch history. We're already ensuring that this doesn't cause problems with our if statement on 82.
* access and manipulate large branch histories, as are needed in | ||
* sophisticated branch predictors. | ||
* | ||
* The bits of the branch history are stored in a vector of uint64_t values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"vector" should be "array"
std::vector<std::pair<uint8_t, uint64_t>> btb_; | ||
|
||
/** The bitlength of the Tagged tables' indices. | ||
* Each tagged table with have 2^bits entries. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With -> will
|
||
uint64_t TagePredictor::getTag(uint64_t address, uint8_t table) { | ||
// Hash function here is pretty arbitrary | ||
uint64_t h1 = address; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to remove the 2 LSBs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Ideally the hashes for the tag and the index should never each produce the same value for two different branches. Therefore, because the index does remove the 2 LSBs, keeping them here makes the information being passed into the hashes different and so improves the accuracy of the BP (reduces the risk of this type of accidental clashing).
Only needed for a ``Tage`` predictor. The number of tagged tables used by the predictor, in addition to a default prediction table (i.e., the BTB). Therefore, a value of 3 for ``Num-Tage-Tables`` would result in four total prediction tables: one BTB and three tagged tables. If no tagged tables are desired, it is recommended to use the ``GenericPredictor`` instead. | ||
|
||
Tage-Length | ||
Only needed for a ``Tage`` predictor. The number of bits used to tage the entries of the tagged tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "tage" in the latter sentence meant to be that or rather "tag"
Adding a TAGE branch predictor.
Performance relative to previous best (Perceptron) summarised below: